Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation
نویسندگان
چکیده
Recent work in computer vision has yielded impressive results in automatically describing images with natural language. Most of these systems generate captions in a single language, requiring multiple language-specific models to build a multilingual captioning system. We propose a very simple technique to build a single unified model across languages, using artificial tokens to control the language, making the captioning system more compact. We evaluate our approach on generating English and Japanese captions, and show that a typical neural captioning architecture is capable of learning a single model that can switch between two different languages.
منابع مشابه
Generating Image Descriptions using Multilingual Data
In this paper we explore several neural network architectures for the WMT 2017 multimodal translation sub-task on multilingual image caption generation. The goal of the task is to generate image captions in German, using a training corpus of images with captions in both English and German. We explore several models which attempt to generate captions for both languages, ignoring the English outp...
متن کاملCross-Lingual Image Caption Generation
Automatically generating a natural language description of an image is a fundamental problem in artificial intelligence. This task involves both computer vision and natural language processing and is called “image caption generation.” Research on image caption generation has typically focused on taking in an image and generating a caption in English as existing image caption corpora are mostly ...
متن کاملMultilingual Multimodal Language Processing Using Neural Networks
We live in an increasingly multilingual multimodal world where it is common to find multiple views of the same entity across modalities and languages. For example, news articles which get published in multiple languages are essentially different views of the same entity. Similarly, video, audio and multilingual subtitles are multiple views of the same movie clip. Given the proliferation of such...
متن کاملUsing Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine
Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...
متن کاملKeyword Generation for Biomedical Image Retrieval with Recurrent Neural Networks
This paper presents the modeling approaches performed by the FHDO Biomedical Computer Science Group (BCSG) for the caption prediction task at ImageCLEF 2017. The goal of the caption prediction task is to recreate original image captions by detecting the interplay of present visible elements. A large-scale collection of 164,614 biomedical images, represented as imageID caption pairs, extracted f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1706.06275 شماره
صفحات -
تاریخ انتشار 2017